Module 15 - Pre-Trained Models

Overview

A recent development in deep learning has been the advent of pre-trained models based on something called the transformer architecture. Pre-trained models are extremely large neural network models that have been trained on enormous datasets, usually of images or text (in the text context, this is often a scrape of the entire internet). The initial training of these models is laborious and state-of-the-art models can only be trained by very large and well funded organizations. However, it is possible to make use of pre-trained models for a variety of purposes, including the modification of them via transfer or reinforcement learning (i.e. fine tuning). This week we will discuss these models, the transformer architecture, and how to make use of them to solve problems.

Learning Objectives

  • Transformer architecture and pretrained models
  • Reinforcement learning and fine tuning
  • Applications and practical implementations

Readings

  • TBD

Videos